99 research outputs found

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    Multilingual Adaptation of RNN Based ASR Systems

    Full text link
    In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we therefore extended our previous approach by introducing a novel technique which we call "modulation". Based on this method, we modulated the hidden layers of RNNs using LFVs. We evaluated this approach in both full and low resource conditions, as well as for grapheme and phone based systems. Lower error rates throughout the different conditions could be achieved by the use of the modulation.Comment: 5 pages, 1 figure, to appear in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018

    Multilingual Articulatory Features

    Get PDF

    Multi-stage Large Language Model Correction for Speech Recognition

    Full text link
    In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from traditional language models that focus on one single data domain, the rise of LLMs brings us the opportunity to push the limit of state-of-the-art ASR performance, and at the same time to achieve higher robustness and generalize effectively across multiple domains. Motivated by this, we propose a novel multi-stage approach to combine traditional language model re-scoring and LLM prompting. Specifically, the proposed method has two stages: the first stage uses a language model to re-score an N-best list of ASR hypotheses and run a confidence check; The second stage uses prompts to a LLM to perform ASR error correction on less confident results from the first stage. Our experimental results demonstrate the effectiveness of the proposed method by showing a 10% ~ 20% relative improvement in WER over a competitive ASR system -- across multiple test domains.Comment: Submitted to ICASSP 202

    A Cornucopia of Iridium Nitrogen Compounds Produced from Laser‐Ablated Iridium Atoms and Dinitrogen

    Get PDF
    The reaction of laser‐ablated iridium atoms with dinitrogen molecules and nitrogen atoms yield several neutral and ionic iridium dinitrogen complexes such as Ir(N2), Ir(N2)+, Ir(N2)2, Ir(N2)2−, IrNNIr, as well as the nitrido complexes IrN, Ir(N)2 and IrIrN. These reaction products were deposited in solid neon, argon and nitrogen matrices and characterized by their infrared spectra. Assignments of vibrational bands are supported by ab initio and first principle calculations as well as 14/15N isotope substitution experiments. The structural and electronic properties of the new dinitrogen and nitrido iridium complexes are discussed. While the formation of the elusive dinitrido complex Ir(N)2 was observed in a subsequent reaction of IrN with N atoms within the cryogenic solid matrices, the threefold coordinated iridium trinitride Ir(N)3 could not be observed so far

    Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features

    Get PDF
    In an increasingly globalized world, there is a rising demand for speech recognition systems. Systems for languages like English, German or French do achieve a decent performance, but there exists a long tail of languages for which such systems do not yet exist. State-of-the-art speech recognition systems feature Deep Neural Networks (DNNs). Being a data driven method and therefore highly dependent on sufficient training data, the lack of resources directly affects the recognition performance. There exist multiple techniques to deal with such resource constraint conditions, one approach is the use of additional data from other languages. In the past, is was demonstrated that multilingually trained systems benefit from adding language feature vectors (LFVs) to the input features, similar to i-Vectors. In this work, we extend this approach by the addition of articulatory features (AFs). We show that AFs also benefit from LFVs and that multilingual system setups benefit from adding both AFs and LFVs. Pretending English to be a low-resource language, we restricted ourselves to use only 10h of English acoustic training data. For system training, we use additional data from French, German and Turkish. By using a combination of AFs and LFVs, we were able to decrease the WER from 18.1% to 17.3% after system combination in our setup using a multilingual phone set
    corecore